39 research outputs found
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
We consider the paradigm of a black box AI system that makes life-critical
decisions. We propose an "arguing machines" framework that pairs the primary AI
system with a secondary one that is independently trained to perform the same
task. We show that disagreement between the two systems, without any knowledge
of underlying system design or operation, is sufficient to arbitrarily improve
the accuracy of the overall decision pipeline given human supervision over
disagreements. We demonstrate this system in two applications: (1) an
illustrative example of image classification and (2) on large-scale real-world
semi-autonomous driving data. For the first application, we apply this
framework to image classification achieving a reduction from 8.0% to 2.8% top-5
error on ImageNet. For the second application, we apply this framework to Tesla
Autopilot and demonstrate the ability to predict 90.4% of system disengagements
that were labeled by human annotators as challenging and needing human
supervision
Learning of Identity from Behavioral Biometrics for Active Authentication
In this work, we look into the problem of active authentication on desktop computers and mobile devices. Active authentication is the process of continuously verifying a person's identity based on the cognitive, behavioral, and physical aspects of their interaction with the device. In this work, we consider several representative modalities including keystroke dynamics, mouse movement, application usage patterns, web browsing behavior, GPS location, and stylometry. We implement a binary classifer for each modality and organize the classifers as a parallel binary decision fusion architecture. The decisions of each classifer are fed into a decision fusion center (DFC) which applies the Chair-Varshney fusion rule to generate a global decision. The DFC minimizes the probability of error using estimates of each local classifer's false rejection rate (FAR) and false acceptance rate (FRR). We test our approach on two large datasets of 67 desktop computer users and 200 mobile device users. We are able to characterize the performance of the system with respect to intruder detection time and to quantify the contribution of each modality to the overall performance.Ph.D., Computer Engineering -- Drexel University, 201